Concepts in Conceptual Clustering
نویسنده
چکیده
Although it has a relatively short history, conceptual clustering is an especially active area of research in machine learning. There are a variety of ways in which conceptual patterns (the Al contribution to clustering) play a role in the clustering process. Two distinct conceptual clustering paradigms (conceptual sorting of exemplars and concept discovery) are described briefly. Then six types of conceptual clustering algorithms are characterized, attempting to cover the present spectrum of mechanisms used to conceptualize the clustering process. I CONCEPTUAK CKUSTERING: The New Frontier Ever since Michalski wrote about conceptual clustering as a new branch of machine learning (Michalski 1980) there has been ever increasing attention to that family of machine learning tasks. Several researchers have been involved in conceptual clustering research, though early research (the next two citations in particular) was not conducted in the name of conceptual clustering. Wolff (1980) describes MK10. an agglomerative hierarchical data compression system that is able to generate conjunctive descriptions of clusters based on co-occurrences of feature values Kebowitz (1982 and 1983) describes UNIMEM and IPP systems that use what he calls Generalization Based Memory to incrementally clump exemplars into overlapping conceptual categories based on predictive features. Michalski and Stepp (1983) describe CKUSTER/2. a conceptual clustering algorithm for building polythetic clusterings (clusterings whose differences depend on discovered conjunctive concepts rather than variations in the value taken by a single attribute). Kangley and Sage (1984) describe DISCON. an ID3-like (Quinlan 1983) optimal classification tree builder that forms monothetic hierarchical clusterings given a list of "interesting" attributes. Fisher (1984) describes RUMMAGE, a DISCON-like program that does some generalization over attribute values and uses non-exhaustive search. Stepp (1984) describes CKUSTER/S. a conjunctive conceptual clustering algorithm for use on structured exemplars. Kangley, Zytkow. Simon, and Bradshaw (1985) describe GIAUBER, a concept discovery system based partly on MK10. that employs conceptual clumping based on most commonly occurring relations in data. Stepp and Michalski (1986) describe algorithms for incorporating background knowledge and classification goals. Mogensen (1987) describes CKUSTER/CA. a program that forms clusters of structured objects in a goal-directed way through the use of Goal Dependency Networks. Taken together, there is a large diversity of algorithms that now are described by the term conceptual clustering. Fisher and Kangley (1985) provide two views of conceptual clustering (as extended numerical taxonomy, and as concept formation) and This research was supported in part by the National Science Foundation under grant NSF 1ST 85-11170. also give an enlightened characterization of several conceptual clustering algorithms. In the following sections, two somewhat different views of conceptual clustering are described. The first view is that of cluster formation per se. whose goal is the determination of extensionally defined clusters. The conceptual part of the process lies in how the exemplars are agglomerated/divided rather than in how the clusters are described (i.e.. the cluster forming mechanism need not maintain any cluster descriptions). The second view is that of concept formation, with exemplars as the catalyst. Under this view clusters are formed according to their conceptual descriptions, i.e., the system must constantly maintain conceptual descriptions of clusters and cluster membership is constrained by the concepts available to describe the results. Following the terminology of psychology, the first view will here be called conceptual sorting. The second view wil l be called concept discovery. Each in its own way can be said to involve conceptual clustering. II CONCEPTUAK CKUSTERING AS CONCEPT SORTING The process of clustering is to group exemplars in some interesting way (or ways) such as a hierarchy of categories or a tree structure (dendrogram). Numerical taxonomy readily provides such groupings, but the groups have little or no conceptual interpretation One view of conceptual clustering proposes to produce interesting groupings and then provide them with a conceptual interpretation. That is. to build extensionally defined categories (by enumerating their members) and then find a conceptual interpretation. Naturally, some subpopulations of exemplars are easier to interpret (i.e.. form better conceptual clusters) than others. Fisher (1985) proposes such a view, and states that the two phases (called the aggregation and characterization problems, respectively) are not independent. That the clustering and characterization phases are not independent (assuming they are separate processes) is precisely one of the facets that distinguishes conceptual clustering from "regular" clustering. Indeed, one can perform statistical clustering, take the extensionally defined resulting clusters and then generate conceptual interpretations for them. There are cluster ing problems for which this is an acceptable approach—cluster analysis was done exclusively just this way for a long while, with the analyst doing all the interpretation. But in general, concepts derived from independently rendered clusters have potentially messy conceptual characterizations, involving disjunctive conceptual forms (Michalski and Stepp 1983) But one should note that certain patterns of disjunction can be restated as polymorphic concepts ("n of m properties must be present") and some clustering research is directed at finding polymorphic classifications (e.g.. (Hanson and Bauer 1986)). A major reason independently rendered clusters can have rather unappealing conceptual interpretations is that they
منابع مشابه
A Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملConstraint Programming for Multi-criteria Conceptual Clustering
A conceptual clustering is a set of formal concepts (i.e., closed itemsets) that defines a partition of a set of transactions. Finding a conceptual clustering is an NP-complete problem for which Constraint Programming (CP) and Integer Linear Programming (ILP) approaches have been recently proposed. We introduce new CP models to solve this problem: a pure CP model that uses set constraints, and ...
متن کاملLexical acquisition and clustering of word senses to conceptual lexicon construction
We describe a mechanism and an algorithm to support construction of a large complex conceptual lexicon from an existing alphabetical lexicon. As part of this research, we define lexical models to present words and lexicons. Given the fact that an alphabetical lexicon contains lexical information about words which are organized by their spelling, constructing a conceptual lexicon requires an ide...
متن کاملخوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کاملConceptual Clustering in a First Order Logic Representation
: We present the Conceptual Clustering system KBG. The knowledge representation language used, both for input and output, is based on first order logic with some extensions to handle quantitative and procedural knowledge. From a set of observations and a domain theory, KBG structures this information into a directed graph of concepts. This graph is generated by an iterative use of clustering an...
متن کامل